class: title-slide, right, top background-image: url(data:image/png;base64,#img/moon.JPG) background-position: 90% 75%, 75% 75% background-size:cover .left-column[ # GRS Workshop<br>Introduction to ggplot ] .right-column[ ### Getting starting - why ggplot? **Eugene Hickey**<br> March 14th 2023 ] .palegrey[.left[.footnote[Graphic by [Elaine Hickey](https://photos.google.com/photo/AF1QipMjKNoaxyne8nte4HmxA6Th9-4fUfSbl_mx-_1G)]]] ??? Welcome to the workshop on ggplot. Where we'll show you how to create impressive data visualisations. --- name: about-me layout: false class: about-me-slide, inverse, middle, center # About me <img style="border-radius: 50%;" src="data:image/png;base64,#img/eugene.jpg" width="150px"/> ## Eugene Hickey ### lecturer in physics .fade[Technological University<br>Dublin] [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> www.fizzics.ie](https://www.fizzics.ie) [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> @eugene100hickey](https://twitter.com/eugene100hickey) [<svg viewBox="0 0 496 512" style="position:relative;display:inline-block;top:.1em;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> eugene100hickey](https://github.com/eugene100hickey) --- layout: true <a class="footer-link" href="http://grs-2023.netlify.app">data-visualisation-grs2023 — Eugene Hickey</a> <!-- this adds the link footer to all slides, depends on footer-link class in css--> --- class: top # Acknowledgments .pull-left-narrow[.center[<img style="border-radius: 50%;" src="img/Giovanna.jpeg">]] .pull-right-wide[ [Giovanna](https://www.tudublin.ie/research/postgraduate-research/graduate-research-school/meet-the-team/giovannarampazzo.html), co-pilot for this workshop and administrator of the Graduate Research School. ] -- .pull-left-narrow[.center[ <img style="border-radius: 50%;" src="data:image/png;base64,#img/tudublin-logo.jpg" width="125px"/>]] .pull-right-wide[ [Graduate Research School](https://www.tudublin.ie/research/postgraduate-research/graduate-research-school/) for the opportunity to provide this workshop ] -- .pull-left-narrow[.center[ <svg viewBox="0 0 496 512" style="position:relative;display:inline-block;top:.1em;fill:#e5bf00;height:3em;" xmlns="http://www.w3.org/2000/svg"> <path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm0 448c-110.3 0-200-89.7-200-200S137.7 56 248 56s200 89.7 200 200-89.7 200-200 200zm-80-216c17.7 0 32-14.3 32-32s-14.3-32-32-32-32 14.3-32 32 14.3 32 32 32zm160 0c17.7 0 32-14.3 32-32s-14.3-32-32-32-32 14.3-32 32 14.3 32 32 32zm4 72.6c-20.8 25-51.5 39.4-84 39.4s-63.2-14.3-84-39.4c-8.5-10.2-23.7-11.5-33.8-3.1-10.2 8.5-11.5 23.6-3.1 33.8 30 36 74.1 56.6 120.9 56.6s90.9-20.6 120.9-56.6c8.5-10.2 7.1-25.3-3.1-33.8-10.1-8.4-25.3-7.1-33.8 3.1z"></path></svg>]] .pull-right-wide[ - [xaringan 📦](https://github.com/yihui/xaringan#xaringan) developed by Yihui Xie - [flipbookr 📦](https://github.com/EvaMaeRey/flipbookr) developed by Gina Reynolds - [learnr 📦](https://github.com/rstudio/learnr) developed by Garrick Aden-Buie ] --- # Target Audience - graduate students looking for better ways to present their data - people currently using tools like MS Excel for visualisations --- # Why R? - working with a mouse isn't reproducible - difficult to log exactly what you've done - hard to repeat for a series of diagrams - difficult to be inspired by other people's work - good to separate sources of data and the visualisations that disply them - R uses series of commands that input, manipulate, and display data - lots of contributors around the world, diverse fields --- # Course Contents 1. System Configuration - installing software 2. Using RStudio 3. Introduction to R 4. Getting and Cleaning Data 5. Exploratory Analysis - making rough plots 6. Different Types of Plots 7. Playing with Aesthetics 8. Using Plotting Themes 9. Advanced Topics - Maps, Networks --- ## Why We're Here - Alternative to Excel, and Tableau - Enables Reproducible Research - Can Make Lots of Plots Quickly - Good for Exploratory Analysis - Publication Ready Figures --- ## And.... a gateway to so much more - data capture - statistical analysis - machine learning - artificial intelligence - writing your thesis - writing a blog --- ## Not Why We're Here - Won't discuss choices for data presentation - Nor good practices in visualisations - but these are sort of in the background - This isn't a machine learning course - but lots of the techniques we'll use are relevant - So, this course it about skills development, how you use these is up to you. --- ## We said we wouldn't discuss this....but - Graphics are important, overlooked, and inconsistent - the last mile of data analysis - Need to tell a story - Can be misleading, almost always by accident - Choice of colours - we'll spend some time on this - Choice of fonts - Keep it simple - reduce amount of ink - Increasing number of options for showcasing your data --- # Lots of addin packages for ggplot gg.gap, ggalignment, ggallin, ggalluvial, ggalt, ggamma, gganimate, ggarchery, ggasym, ggbeeswarm, ggblanket, ggborderline, ggbrain, ggbreak, ggBubbles, ggbuildr, ggbump, ggchangepoint, ggcharts, ggChernoff, ggcleveland, ggcorrplot, ggcorset, ggcoverage, ggdag, ggdark, ggDCA, ggdemetra, ggdendro, ggdensity, ggdist, ggdmc, ggDoE, ggDoubleHeat, gge, ggeasy, ggedit, ggeffects, ggenealogy, ggESDA, ggetho, ggExtra, ggfan, ggfittext, ggfocus, ggforce, ggformula, ggfortify, ggfun, ggfx, gggap, gggenes, ggghost, gggibbous, gggrid, ggh4x, gghalfnorm, gghalves, gghdr, ggheatmap, gghighlight, gghilbertstrings, ggHoriPlot, ggimage, ggimg, gginference, gginnards, ggip, ggiraph, ggiraphExtra, ggisotonic, ggjoy, gglasso, gglgbtq, gglm, gglorenz, ggm, ggmap, ggmapinset, ggmatplot, ggmcmc, ggmice, ggmix, ggmosaic, ggmotif, ggmr, ggmuller, ggmulti, ggnetwork, ggnewscale, ggnormalviolin, ggnuplot, ggOceanMaps, ggokabeito, ggpackets, ggpage, ggparallel, ggparliament, ggparty, ggpath, ggpattern, ggpcp, ggperiodic, ggpie, ggplate, ggplot.multistats, ggplot2, ggplot2movies, ggplotAssist, ggplotgui, ggplotify, ggplotlyExtra, ggpmisc, ggPMX, ggpointdensity, ggpointless, ggpol, ggpolar, ggpolypath, ggpp, ggprism, ggpubr, ggpval, ggQC, ggQQunif, ggquickeda, ggquiver, ggrain, ggRandomForests, ggraph, ggraptR, ggrasp, ggrastr, ggredist, ggrepel, ggResidpanel, ggridges, ggrisk, ggroups, ggsci, ggseas, ggsector, ggseg, ggseg3d, ggseqlogo, ggseqplot, ggshadow, ggside, ggsignif, ggsn, ggsoccer, ggsolvencyii, ggsom, ggspatial, ggspectra, ggstance, ggstar, ggstats, ggstatsplot, ggstream, ggstudent, ggsurvey, ggsurvfit, ggswissmaps, ggtea, ggtern, ggtext, ggThemeAssist, ggthemes, ggtikz, ggTimeSeries, ggtrace, ggtrendline, ggupset, ggvenn, ggVennDiagram, ggversa, ggvis, ggvoronoi, ggwordcloud, ggx --- # And others, that make ggplots that can then be modified and treated as such .pull-left[ ```r fviz_cluster_example ``` <!-- --> ] .pull-right[ ```r fviz_cluster_example + theme_clean() ``` <!-- --> ] --- # Other reasons - ggplot is easy to make publication-ready - easier to make sequence of visualisations - fits in nicely with the rest of the tidyverse --- class: center, inverse # Let's Begin --- ## Install R and RStudio --- ### *R* - Go to [*CRAN*](https://cran.r-project.org/) ### *RStudio* - this is the IDE we will use (and pretty much everyone else uses) - R is the engine, RStudio is the cockpit - download from [*RStudio*](https://rstudio.com/products/rstudio/download/) --- ## Using RStudio - toolbar across the top - I don't use this very much - set of quick links below that - top left (green plus sign) is about the only one I use - 4 Panes - top left for files or looking at data - bottom left for the console - top right for *Environment* - tells what variables are stored - bottom right for plots and help --- [This](https://learnr-examples.shinyapps.io/ex-setup-r/#section-welcome) is a nice tutorial suite to explain installing R and RStudio: --- <img src="data:image/png;base64,#img/rstudio.PNG" height="600px" width="800px" align="center"/> --- - usual work flow is: - try commands out at the console (bottom left) - when that works, store them in a file (top left) - when sequence of commands works, put them into a document (also top left) --- ## Extending R - installing R just gives you *base* R - beauty of this tool lies with *packages* - we'll look at installing these from three sources: - CRAN - Bioconductor - github --- - [CRAN](https://cran.r-project.org/) - example, on console type *install.packages("tidyverse")* - this installs the tidyverse package (or rather, family of packages) - over 20k packages on CRAN (see list [here](http://cran.nexr.com/web/packages/available_packages_by_name.html)) - sometimes esoteric ([engsoccerdata](http://cran.nexr.com/web/packages/engsoccerdata/index.html)) - sometimes cutting edge ([deep learning](http://cran.nexr.com/web/packages/keras/index.html)) - each package heavily curated and maintained --- - [Bioconductor](www.bioconductor.org) - set of bioinformatics packages (lots of genomics) - start with *install.packages("BiocManager")* - then *BiocManager::install("some_genomics_package")* to use - list of packages [here](http://bioconductor.org/packages/release/BiocViews.html) - about 3,000 packages, including genome builds --- - github - packages in development - start with *install.packages("devtools")* - then *devtools::install_github("developer_name/package_name")* - almost 80k packages [here](http://rpkg.gepuro.net/) - the package *githubinstall* is useful to search these --- background-position: center background-size: contain class: center, inverse # Resources - [Big Book of R](https://www.bigbookofr.com/index.html) --- - books - *recommended text* **Data Visualization** by Kieran Healy (ISBN = 978-0691181622). ~€25. Also online at [https://socviz.co/index.html](https://socviz.co/index.html) - [Hadley's book, R for Data Science](https://r4ds.had.co.nz/) - [Hadley's book on ggplo2](https://ggplot2-book.org/) - [Data Visualization by Wilke](https://serialmentor.com/dataviz/), lots of his actual code is on github at [https://github.com/clauswilke/practical_ggplot2](https://github.com/clauswilke/practical_ggplot2) - check out the list of online books at [bookdown.org](bookdown.org) <img src="data:image/png;base64,#img/hadley.jpg" height="100px" width="100px" align="right"/> --- - websites - Karl Broman (https://www.biostat.wisc.edu/~kbroman/), and particularly [this presentation](https://www.biostat.wisc.edu/~kbroman/presentations/graphs_MDPhD2014.pdf) - course by Boemhke on github [github.com/uc-r/Intro-R](https://github.com/uc-r/Intro-R) - the good people at RStudio have lots of help at [resources.rstudio.com/](https://resources.rstudio.com/) - [Cedric](https://cedricscherer.netlify.com/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/). - [The R Graph Gallery](https://www.r-graph-gallery.com/index.html) is pretty good and worth checking out <br/> <br/> <br/> <img src="data:image/png;base64,#https://github.com/yihui/xaringan/releases/download/v0.0.2/karl-moustache.jpg" height="80px" width="100px" align="right"/> --- - Blogs and Podcasts - [www.simplystatistics.org](www.simplystatistics.org) - [varianceexplained.org](http://varianceexplained.org/) - [Not So Standard Deviations](http://nssdeviations.com/) - [Thomas Lin Pedersen video](https://github.com/thomasp85/ggplot2_workshop) <br/> <br/> <br/> <br/> <br/> <br/> <img src="data:image/png;base64,#img/hillary.jpeg" height="100px" width="100px" align="right"/> --- - Online Courses - Coursera: [Data Science from Johns Hopkins](https://www.coursera.org/specializations/jhu-data-science). The course notes are on [github](http://datasciencespecialization.github.io/) - edx.org [course from Irizarry](https://www.edx.org/course/data-science-visualization) - [datacamp](www.datacamp.com) <br/> <br/> <br/> <br/> <br/> <br/> <img src="data:image/png;base64,#img/rafael.jpg" height="100px" width="100px" align="right"/> --- - Miscellaneous - [Dublin R MeetUp](https://www.meetup.com/DublinR/) - [RWeekly.org](rweekly.org), round up of events in the world of R - [#Rstats on twitter](https://twitter.com/search?q=%23rstats&src=typed_query) - [#TidyTuesday](https://twitter.com/search?q=%23TidyTuesday&src=typeahead_click) on twitter - [R Cheatsheets](https://rstudio.com/resources/cheatsheets/) - if you get stuck, google is your friend. Often sends you to stackoverflow.com or stackexchange.com --- - Some stuff about graphics in general - [again, from Irizarry](http://genomicsclass.github.io/book/pages/plots_to_avoid.html) - [hit parade of graphs in R](https://www.r-graph-gallery.com/index.html) - [Cedric Scherer again](https://cedricscherer.netlify.com/) - some stuff from [Christian Burkhard](https://ggplot2tor.com/make_any_plot_look_better/make_any_plot_look_better/) - and from [Laura Ellis](https://www.littlemissdata.com/) - and from [Peter Aldhous](http://paldhous.github.io/ucb/2016/dataviz/) - [colours in R](https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/colorPaletteCheatsheet.pdf) - cool book on good graphics from [Stephen Few](https://nces.ed.gov/programs/slds/pdf/08_F_06.pdf) - [The Glamour of Graphics](https://www.williamrchase.com/slides/assets/player/KeynoteDHTMLPlayer.html#0) talk from last months RStudio Conference